Scalable Blas 2 and 3 Matrix Multiplication for Sparse Banded Matrices on Distributed Memory Mimd Machines
نویسندگان
چکیده
In this paper, we present two algorithms for sparse banded matrix-vector and sparse banded matrix-matrix product operations on distributed memory multiprocessor systems that support a mesh and ring interconnection topology. We aslo study the scalability of these two algorithms. We employ systolic type techniques to eliminate synchronization delay and minimize the communication overhead among processors. The performance of algorithms developed for the above operations depends on the bandwidth of the matrices involved and have been currently implemented on the NCUBE II with 64 processors. Our preliminary experimental data agree with the expected theoretical behavior.
منابع مشابه
PB-BLAS: a set of parallel block basic linear algebra subprograms
We propose a new software package which would be very useful for implementing dense linear algebra algorithms on block-partitioned matrices. The routines are referred to as block basic linear algebra subprograms (BLAS), and their use is restricted to computations in which one or more of the matrices involved consists of a single row or column of blocks, and in which no more than one of the matr...
متن کاملA Parallel Computational Kernel for Sparse Nonsymmetric Eigenvalue Problems on Multicomputers
The aim of this paper is to show an effective reorganization of the nonsymmetric block lanczos algorithm efficient, portable and scalable for multiple instructions multiple data (MIMD) distributed memory message passing architectures. Basic operations implemented here are matrix-matrix multiplications, eventually with a transposed and a sparse factor, LU factorisation and triangular systems sol...
متن کاملTechniques for Parallel Manipulation of Sparse Matrices
New techniques are presented forthe manipulation of sparse matrices on parallel MIMD computers. We consider the following problems: matrix addition, matrix multiplication, row and column permutation, matrix transpose, matrix vector multiplication, and Gaussian elimination.
متن کاملElimination Forest Guided D Sparse LU Factorization
Sparse LU factorization with partial pivoting is important for many scienti c applications and delivering high perfor mance for this problem is di cult on distributed memory machines Our previous work has developed an approach called S that incorporates static symbolic factorization supernode partitioning and graph scheduling This paper studies the properties of elimination forests and uses the...
متن کاملA Fast Scalable Universal Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers
We present a fast and scalable matrix multiplication algorithm on distributed memory concurrent computers, whose performance is independent of data distribution on processors, and call it DIMMA1 (Distribution-Independent Matrix Multiplication Algorithm). The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectivel...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007